Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DR-0002 Use syntect to extract doc comments from code

#decision #extractor Manuel Woelker 2025-06-22T14:21:06+00:00

Status: Approved
Date: 2025-06-19

Decision

To extract doc comments from source code, we will use the syntect crate.

Context

To extract doc comments from code, we need to find all the comments in the code, for various languages.

The requirements for this extractor were:

  1. Wide support for various programming language formats
  2. Robustness against invalid code/syntax
  3. Good performance

Consequences

syntect is used to extract doc comments from the code.

To support as many languages as possible, the two-face crate is used.

Considered Alternatives

Custom lexer

A custom lexer could be implemented to find comments. Due to the number of languages and the complexity of handling different syntaxes, this might not be a good idea. Especially handling "comment-like" syntax in strings would potentially mean having a custom lexer for each language.

tree-sitter

tree-sitter parsers could be used to extract the comments from source files.

The drawback is that these parsers need to be curated, are platform-specific and are relatively heavyweight.

inkjet

inkjet bundles ~70 tree-sitter parsers for various languages.

The downside of this approach is that all these parsers need to be compiled (making the compilation much slower) and bundled in the binary (making the binary much larger)

extractor/src/extractor.rs:1